1,044 research outputs found

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

    Full text link
    Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs

    JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

    Get PDF
    The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential code. Optimizations for obtaining the best possible efficiency, however, are often challenging. The insertions of directives by the programmer can lead to side-effects that limit the available compiler optimization possible, which could result in performance degradation. This is exacerbated when targeting multi-GPU systems, as pragmas do not automatically adapt to such systems, and require expensive and time consuming code adjustment by programmers. This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler. We add a versatile code-translation method for multi-device utilization by which manually-optimized applications can be distributed automatically while keeping original code structure and parallelism. We show in some cases nearly linear scaling on the part of kernel execution with the NVIDIA V100 GPUs. While adaptively using multi-GPUs, the resulting performance improvements amortize the latency of GPU-to-GPU communications.Comment: Extended version of a paper to appear in: Proceedings of the 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), December 17-18, 202

    OmpSs-2 and OpenACC interoperation

    Get PDF
    We propose an interoperation mechanism to enable novel composability across pragma-based programming models. We study and propose a clear separation of duties and implement our approach by augmenting the OmpSs-2 programming model, compiler and runtime system to support OmpSs-2 + OpenACC programming. To validate our proposal we port ZPIC, a kinetic plasma simulator, to leverage our hybrid OmpSs-2 + OpenACC implementation. We compare our approach against OpenACC versions of ZPIC on a multi-GPU HPC system. We show that our approach manages to provide automatic asynchronous and multi-GPU execution, removing significant burden from the application’s developer, while also being able to outperform manually programmed versions, thanks to a better utilization of the hardware.This work has been part of the EPEEC project. The EPEEC project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 801051. This paper was also partially funded by the Ministerio de Ciencia e Innovación Agencia Estatal de Investigación (PID2019-107255GB-C21/AEI/10.13039/501100011033). We gratefully acknowledge the support of NVIDIA AI Technology Center (NVAITC) Europe who provided us the remote access to NVIDIA DGX-1Peer ReviewedPostprint (author's final draft

    Measurement of the cosmic ray spectrum above 4×10184{\times}10^{18} eV using inclined events detected with the Pierre Auger Observatory

    Full text link
    A measurement of the cosmic-ray spectrum for energies exceeding 4×10184{\times}10^{18} eV is presented, which is based on the analysis of showers with zenith angles greater than 60∘60^{\circ} detected with the Pierre Auger Observatory between 1 January 2004 and 31 December 2013. The measured spectrum confirms a flux suppression at the highest energies. Above 5.3×10185.3{\times}10^{18} eV, the "ankle", the flux can be described by a power law E−γE^{-\gamma} with index Îł=2.70±0.02 (stat)±0.1 (sys)\gamma=2.70 \pm 0.02 \,\text{(stat)} \pm 0.1\,\text{(sys)} followed by a smooth suppression region. For the energy (EsE_\text{s}) at which the spectral flux has fallen to one-half of its extrapolated value in the absence of suppression, we find Es=(5.12±0.25 (stat)−1.2+1.0 (sys))×1019E_\text{s}=(5.12\pm0.25\,\text{(stat)}^{+1.0}_{-1.2}\,\text{(sys)}){\times}10^{19} eV.Comment: Replaced with published version. Added journal reference and DO

    Energy Estimation of Cosmic Rays with the Engineering Radio Array of the Pierre Auger Observatory

    Full text link
    The Auger Engineering Radio Array (AERA) is part of the Pierre Auger Observatory and is used to detect the radio emission of cosmic-ray air showers. These observations are compared to the data of the surface detector stations of the Observatory, which provide well-calibrated information on the cosmic-ray energies and arrival directions. The response of the radio stations in the 30 to 80 MHz regime has been thoroughly calibrated to enable the reconstruction of the incoming electric field. For the latter, the energy deposit per area is determined from the radio pulses at each observer position and is interpolated using a two-dimensional function that takes into account signal asymmetries due to interference between the geomagnetic and charge-excess emission components. The spatial integral over the signal distribution gives a direct measurement of the energy transferred from the primary cosmic ray into radio emission in the AERA frequency range. We measure 15.8 MeV of radiation energy for a 1 EeV air shower arriving perpendicularly to the geomagnetic field. This radiation energy -- corrected for geometrical effects -- is used as a cosmic-ray energy estimator. Performing an absolute energy calibration against the surface-detector information, we observe that this radio-energy estimator scales quadratically with the cosmic-ray energy as expected for coherent emission. We find an energy resolution of the radio reconstruction of 22% for the data set and 17% for a high-quality subset containing only events with at least five radio stations with signal.Comment: Replaced with published version. Added journal reference and DO

    Measurement of the Radiation Energy in the Radio Signal of Extensive Air Showers as a Universal Estimator of Cosmic-Ray Energy

    Full text link
    We measure the energy emitted by extensive air showers in the form of radio emission in the frequency range from 30 to 80 MHz. Exploiting the accurate energy scale of the Pierre Auger Observatory, we obtain a radiation energy of 15.8 \pm 0.7 (stat) \pm 6.7 (sys) MeV for cosmic rays with an energy of 1 EeV arriving perpendicularly to a geomagnetic field of 0.24 G, scaling quadratically with the cosmic-ray energy. A comparison with predictions from state-of-the-art first-principle calculations shows agreement with our measurement. The radiation energy provides direct access to the calorimetric energy in the electromagnetic cascade of extensive air showers. Comparison with our result thus allows the direct calibration of any cosmic-ray radio detector against the well-established energy scale of the Pierre Auger Observatory.Comment: Replaced with published version. Added journal reference and DOI. Supplemental material in the ancillary file

    The Standard European Vector Architecture (SEVA): a coherent platform for the analysis and deployment of complex prokaryotic phenotypes

    Get PDF
    The 'Standard European Vector Architecture' database (SEVA-DB, http://seva.cnb.csic.es) was conceived as a user-friendly, web-based resource and a material clone repository to assist in the choice of optimal plasmid vectors for de-constructing and re-constructing complex prokaryotic phenotypes. The SEVA-DB adopts simple design concepts that facilitate the swapping of functional modules and the extension of genome engineering options to microorganisms beyond typical laboratory strains. Under the SEVA standard, every DNA portion of the plasmid vectors is minimized, edited for flaws in their sequence and/or functionality, and endowed with physical connectivity through three inter-segment insulators that are flanked by fixed, rare restriction sites. Such a scaffold enables the exchangeability of multiple origins of replication and diverse antibiotic selection markers to shape a frame for their further combination with a large variety of cargo modules that can be used for varied end-applications. The core collection of constructs that are available at the SEVA-DB has been produced as a starting point for the further expansion of the formatted vector platform. We argue that adoption of the SEVA format can become a shortcut to fill the phenomenal gap between the existing power of DNA synthesis and the actual engineering of predictable and efficacious bacteria

    Consensus standards for acquisition, measurement, and reporting of intravascular optical coherence tomography studies

    Get PDF
    Objectives: The purpose of this document is to make the output of the International Working Group for Intravascular Optical Coherence Tomography (IWG-IVOCT) Standardization and Validation available to medical and scientific communities, through a peer-reviewed publication, in the interest of improving the diagnosis and treatment of patients with atherosclerosis, including coronary artery disease. Background: Intravascular optical coherence tomography (IVOCT) is a catheter-based modality that acquires images at a resolution of ∌10 ÎŒm, enabling visualization of blood vessel wall microstructure in vivo at an unprecedented level of detail. IVOCT devices are now commercially available worldwide, there is an active user base, and the interest in using this technology is growing. Incorporation of IVOCT in research and daily clinical practice can be facilitated by the development of uniform terminology and consensus-based standards on use of the technology, interpretation of the images, and reporting of IVOCT results. Methods: The IWG-IVOCT, comprising more than 260 academic and industry members from Asia, Europe, and the United States, formed in 2008 and convened on the topic of IVOCT standardization through a series of 9 national and international meetings. Results: Knowledge and recommendations from this group on key areas within the IVOCT field were assembled to generate this consensus document, authored by the Writing Committee, composed of academicians who have participated in meetings and/or writing of the text. Conclusions: This document may be broadly used as a standard reference regarding the current state of the IVOCT imaging modality, intended for researchers and clinicians who use IVOCT and analyze IVOCT data
    • 

    corecore